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ABSTRACT 



Expanding upon the work of IWay fc Sri vast aval (l2006l ) we demonstrate how 



the use of training sets of comparable size continue to make Gaussian process 
regression (GPR) a competitive approach to that of neural networks and other 
least-squares fitting methods. This is possible via new large size matrix inversion 
techniques developed for Gaussian processes (GPs) that do not require that the 
kernel matrix be sparse. This development, combined with a neural-network ker- 
nel function appears to give superior results for this problem. Our best fit results 
for the Sloan Digital Sky Survey (SDSS) Main Galaxy Sample using u,g,r,i,z 
filters gives an rms error of 0.0201 while our results for the same filters in the lu- 
minous red galaxy sample yield 0.0220. We also demonstrate that there appears 
to be a minimum number of training-set galaxies needed to obtain the optimal 
fit when using our GPR rank-reduction methods. We find that morphological 
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information included with many photometric surveys appears, for the most part, 
to make the photometric redshift evaluation slightly worse rather than better. 
This would indicate that most morphological information simply adds noise from 
the GP point of view in the data used herein. In addition, we show that cross- 
match catalog results involving combinations of the Two Micron All Sky Survey, 
SDSS, and Galaxy Evolution Explorer have to be evaluated in the context of the 
resulting cross-match magnitude and redshift distribution. Otherwise one may 
be misled into overly optimistic conclusions. 

Subject headings: galaxies: distances and redshifts - methods: statistical 



1. Introduction 



General approaches to calculating photor netric redshifts from broa d band photometric 
data have been discussed elsewhere recently ( IWay fc Srivastaval l2006l . hereafter Paper I). 
These involve template based approaches and what are referred to as training-set approaches. 
In this paper we expand upon the training-set approaches outlined in Paper I using Gaussian 
processes (GPs). Previously we were limited to training set sizes of order 1000 because a 
matrix inversion of order 1000x1000 was required for calculating the GPs. Part of the 
limitation was due to the amount of single thread accessible RAM on our circa 2005 32bit 
computers, meaning that one could not invert a matrix larger than about 0(1000x1000) 
in size at one time within Matlat0, our choice for implementing GPs. Today one can now 
use commodity based 64bit workstations and invert matrices of 0(20000) within Matlab. 
However, even this is a small fraction of the total potential size of today's photometric 
redshift training sets. For this reason we have developed new non-sparse rank-reduction 
matrix inversion techniques that allow one to use over 100,000 training samples. From 
this work we demonstrate that the new rank-reduction methods only require approximately 
30-40,000 sample s to get the optimal possible fit from GPs on Sloan Digital Sky Survey 
dYork et al.l[2000l . SDSS) data. 



Since Paper I several new approaches to Galaxy photometric redshifts from broad band 
photometry have come about along with expansion and refinement of previously published 
methods. Below is a summary of some of these approaches. 



Kurtz et al.l (120071 ) have used the Tolman surface brightness test (/x- PhotoZ) using the 



relation ^k.{1+z) where ^ is the galaxy surface brightness in the SDSS r band via the 50% 
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PetrosianI (Il976l ) radii (petroRadSOj:): /i=petroMag_r + 2.5(0.798 + 21og(petroRad50_r)) 
and the galaxy r-i colors to pick the red galaxies this method is intended for. The Petrosian 
radii may add useful information because of the angular diameter distance relation. We also 
find this to be the case for GPs as discussed in Section [6] below. 



Carliles et al.l (120081 ) have used Random Forests (ensembles of classification and regres- 
sion trees) to estimate photometric redshifts from the SDSS. Like GPs (see Paper I) this 
method is also supposed to give realistic individual galaxy photometric r edshift error esti- 
mates and few or no catastrophic photometric redshift prediction failures. iBall et al.l (l2008l ) 
continue their work using machine learning methods to derive photometric reds hifts for galax- 



ies an d quasars using the SDSS and the Galaxy Evolution Explorer (GALEX, iMartin et al. 
boOsl lR. In particular, they have made interesting progress in eliminating catastrophic fail- 
ur es in quasar photo- 2 : estimation while bringing down the rms error (RMSE) values. Work 
by iKaczmarczik et al.l (120091 ) uses astrometric information to break degeneracies in quasar 
photometric redshifts which may also be applied to other kinds of data. 



Wray fc GunnI ( l2008l ) have taken a Bayesian approach using the SDSS apparent mag- 
nitude colors u-g, g-r, r-i, i-z, surface brightness /ij in the i band, the Sersic n-index 
( ISersid 119681 ). and the absolute magnitude Mj "corrected" to z=0.1. Some of these quanti- 
ti es are only ava i lable from the New York University Value Added Cat alog (NYC-VAGC) 
of iBlanton et al.l (120051 ) or calculated from th e raw photometry directly. IWang et al.l (120081 ) 
have used support vector machines (also see IWadadekarl 120051) and k ernel regression on a 
SDSS and Two Micron All Sky Survey r2MASS. Iskrutskie et al.l [2006llN cross- match list. 



D'Abrusco et al.l (120071 ) utilized a supervised neural network using a standard multilayer 



perceptron, but operated in a Bayesian framework on two different SDSS datasets. One o f 
their data sets consists of the SDSS r )ata Release Five (DRSlAdelman-McCarthy et al.ll2007l ) 
luminous red galaxy (LRG) sample (lEisenstein et al.ll200ll ). and the other which they term 
the "General Galaxy sample" includes all objects classified as "GALAXY" in the SDSS. 
They then break their sample up into two redshift ranges and after some interpolation fit 
to the residuals they obtain impre ssive results, especially for the LRG sample (see their 
Table 4). In a higher redshift study IStabenau et al.l (120081 ) used surface brig htness priors to 



improve their template based scheme for photometric redshifts in the VVDS (iLe Fevre et al. 
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2Oo30 and GOODS Jciavalisco et aljl2004l lR surveys. 



This certainly does not cover all of the recent work in this field, but is a representa- 
tive sample to show the intense interest being generated because o f near-future larg e-area 
multi-band surveys like the L arge Synoptic Survey Telescope (LSST Ivezic et al. 2008 1^ and 
PanStarrs jKaiser et aljl2002h . 



We have used a variety of datasets in our analysis which are discussed in Section |21 
Discussion of the photometric and spectroscopic quahty of the datasets along with other 
photometric pipeline output properties of interest are found in Section [31 The methods used 
to obtain photometric redshifts are in Section jH How to pick the optimal sample size, matrix 
rank, and inversion method in Section [5l Results are in Section [61 and Conclusions in Section 

m 



The Sloan Digital Sky Survey, The Two Micron All Sky Survey and The 
Galaxy Evolution Explorer Datasets 



Most of the work herein util izes the SDSS Main G alaxy Sample (MGS, IStrauss et al. 



2OO2I ) and the LRG saraple (L RG lEisenstein et al 



2OOII) from the SDSS D ata Release Three 



fDR3. lAbazaiian et al.ll2005f ) and DR5 flAdelman-McCarthv et al.l 120071 ). We include the 
DR3 to facilitate comparison between the present work and that from Paper I. We also 
utilize the DR5 to maximize the size of our cross-match catalogs. 

For comparison with other work we have cross-matched the SDSS datasets with both 
the 2MASS extended source catalog and GALEX Data Release 4 (GR4)0 All Sky Survey 
photometric attributes. Our method of cross-matching these catalogs has not changed since 
Paper I except that we now cross-match against the SDSS DR5 instead of the DR3 to 
increase the size of our catalogs. Many aspects of the SDSS, 2MASS, and GALEX surveys 
relevant to this work were described in Paper I and hence we will not repeat them here. The 
only new catalog included since Paper I is the SDSS LRG. The SDSS LRG sample is similar 
to the SDSS MGS except that it explicitly targets the LRGs. These galaxies have a fairly 
uniform spectral energy distribution (SED) and a strong 4000 A break which tend to make 
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calculating photometric redshifts easier than for the MGS (e.g. iPadmanabhan et al.ll2005l ) 
since the training-set contains more homogenous SEDs. Since these galaxies are among the 
most luminous galaxies in the universe and tend to be found in over dense regions (e.g., 
clusters/groups of galaxies) they are als o good candidates for mapping the largest scales in 
the universe; see (lEisenstein et al.ll200ll ) for more details. 



3. Photometric and redshift quality, morphological indicators and other 

catalog properties 

For SDSS photometric and redshift quality we follow much the same recipe as in Paper I. 
However, unlike Paper I we refrain from using SDSS photometry of the highest quality (what 
we referred to as "GREAT" ) as we did not see any consistent improvments in our regression 
fits using this higher quality photometry. We stick with the SDSS photometric "GOOD" 
flags as defined in Paper I: IBRIGHT and IBLENDED and ISATURATED. See Table 2 in 
Paper I for a description of the flags. We utilize the same photometric quality flags for the 
GALEX and 2MASS datasets as described in Paper I, Section 3. We incorporate the same 
SDSS morphological indicators as in our previous work (See Paper I, Section 3.5). The SDSS 
casjob^ queries used to get the data are the same as those in the Appendix of Paper I except 
in the case of the LRGs utilized herein which require primtarget=TARGET_GALAXY_RED 
(p.primtarget & 0x00000020 > 0) instead of primtarget=TARGET_GALAXY (p.primtarget 
& 0x00000040 > 0) for the MGS. 

Tables [T] and [2] contain a comprehensive list of the six data sets used herein. 



4. Improved Gaussian Process Methods 



In this section we will discuss our investigation of different GP transfer functions (ker- 
nels) & rank-reduction matrix inversion techniques. Our results suggest that there may be 
an upper limit to the number of training-set galaxies needed to derive photometric redshifts 
using the SDSS, but this result should be viewed with caution. While there have been recent 
suggestions that one may quantify the maximum number of galaxies required to obtain an 
optimal fit (IBernstein fc Hutererll2009l ). in practice what we see with the GPs could be an 
artifact of the algorithm itself. In particular, it might be desirable to explore building good 
"local" models to compare with the present GPs (and neural networks), which are global 
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Table 1. Data Sets 1-3 



Data Set I'' 
SDSS-DR3 MGS 
Training=180045,Testing=20229'' 


Data Set 2 
SDSS-DR5 LRG 
Training=87002,Testing=9666 


Data Set 3 
SDSS-DR3 MGS + GALEX-GR4 
Training=30036,Testing=3374 


g-r-i 


g-r-i 


g-r-i 


u-g-v-x 


11- g-r-i 


u-g-r-i 


g-r-i-z 


g-r-i-z 


g-r-i-z 


u-g-r-i-z 


u-g-r-i-z 


u-g-r-i-z 






nuv-fuv-g-r-i 






nuv-fuv-u-p-r-i 






rmv-inv-g-r-i-z 






nnv-{nv-u- g-r-i-z 


u-g-r-i-z-p50 


u-g-r-i-z-p50 


imv-{\iv-u- g-r-i- z-p50 


u-g-r-i-z-p50-p90 


u- g-r-i- z-p50-p90 


nuv-fuv-i(-g-r-i-2:-p50-p90 


u-g-r-i-z-p50-p90-ci 


u-p-r-i-2-p50-p90-ci 


nuv-fuv-u-5-r-i-2-p50-p90-ci 


u-g-r-i-z-p50-p90-ci-qr 


u-5-r-i-z-p50-p90-ci-qr 


nuv-fuv-u-g-r-i-z-p50-p90-ci-qr 


u- g-r-i- z-p50-p90-id 


u-g-r-i-2:-p50-p90-fd 


nuv-fuv-u-g-r-i-2:-p50-p90-fd 


M-g-r-i-z-z-p50-p90-fd-qr 


u-g-r-i-z-p50-p90-fd-qr 


nuv-fuv-u-g-r-i-2:-p50-p90-fd-qr 



'^u-g-r-i-z=5 SDSS magnitudes, p50=Petrosian 50% light radius in the SDSS r band, p90=Petrosian 
90% hght radius in the r band, ci=Petrosian inverse concentration index, fd=PracDev value, qr=Stokes Q 
value in the r band, nuv=GALEX Near UV band, fuv=GALEX Far UV band, see Paper I Section 3.6 for 
more details. 

''These axe the sizes of the testing and training sets used in our analysis 



Table 2. Data Sets 4-6 



Data Set 4^ 
SDSS-DR5 LRG + GALEX-GR4 
Training=4042,Testing=454 

g-r-i 

u-g-r-i 

g-r-i-z 

u-g-r-i-z 

nuv-fuv-g-r-i 

nuv-fuv-u-p-r-i 

nuv-iuy- g-r-i-z 

nnv-iuv-u- g-r-i-z 

miv-fuv-u- g-r-i- z-p50 

nuv-fuv-«-g-r-j-2-p50-p90 

nuv-fuv-'U-g-r-i-2-p50-p90-ci 

nuv-fuv-u-5-r-i-2-p50-p90-ci-qr 

nuv-fuv-u-p-r-i-z-p50-p90-fd 

nuv-fuv-u-p-r-i-z-p50-p90-fd-qr 



Data Set 5 
SDSS-DR5 MGS + 2MASS 
Training=133947,Testing=15050 

g-r-i 

u-g-r-i 

g-r-i-z 

u-g-r-i-z 

g-r-i-j-h-k 

u-g-r-i-j-h-k 

g-r-i-z-j-h-k 

u-g-r-i-z-}-h-k 



Data Set 6 
SDSS-DR5 LRG + 2MASS 
Training=39344,Testing=4420 

g-r-i 

u-g-r-i 

g-r-i-z 

u-g-r-i-z 

g-r-i-j-h-k 

u-g-r-i-j-h-k 

g-r-i-z-j-h-k 

u-g-r-i-z-j-h-k 



^Urg-r-i-z=5 SDSS magnitudes, p50=Petrosian 50% light radius in the SDSS r band, p90=Petrosian 
90% light radius in the r band, ci=Petrosian inverse concentration index, fd=Pra<;Dev value, qr=Stokes Q 
value in r band, nuv=GALEX Near UV band, fuv=GALEX Far UV band, j=2MASS j band, /i=2MASS 
h band, fe=2MASS k band; see Paper I Section 3.6 for more details. 
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models. 



In the GP method utihzed herein one would begin with a training set matrix X of 
dimensions nxd, where n is the number of galaxies and d is the number of components which 
might include broad band flux measurements and morphological information. One would 
also have a target vector y of dimensions n x 1, which would contain the known redshift for 
each galaxy in our case. The testing data are in a matrix X* of dimension n* x d with target 
values in a matrix y* consisting of n* x 1 redshifts, where n* is the number of test samples. 
We wish to predict the value of y* given as X, y, and X*. The prediction of y* requires 
a covariance function k{x,x'), with x and x' vectors with d components. This covariance 
function can be used to construct a. n x n covariance matrix K, where Kij = k{xi,Xj) for 
rows Xi and Xj of X, and the n* x n cross covariance matrix K* {K*j = k{x*,Xj) where x* 
is the ith row of X*). Once this is accornplishe d, the prediction y* for y* may be given by 
the GP equation (IRasmussen fc Williamsll2006l . p. 17): 



y 



K*{\^I + K)~^y 



(1) 



where A represents the noise in y and can be used to improve the quality of the model 
(IRasmussen fc Williams! |2006| ). 



In addition to the prediction the GP approach also leads to an equation for C the 
covariance matrix for the predictions in equation [ H If the n* x n* matrix K** has entries 
k{x*,x*) then (IRasmussen fc Wilhamd bood p. 79): 



K* 

13 



C = K** - K*{XI + Ky^K 



*T 



(2) 



The superscript T indicates the transpose. The pointwise variance of the prediction is 
diag(C), the diagonal of the n* x n* matrix C . 

For details about the selection o f A, the covar i ance f unct ion (kernel) fc, hyperparameter s 
in the kernel, and GPR in general see IPoster et all J2009h and IRasmussen fc Williams! (12006! ). 
The following discussion is a summary of IPoster et al. ( 2009 ). We will use the above notation 
for the sections that follow. 



4.1. Different Kernel choices 

In Paper I we relied exclusively on a polynomial kernel, but to investigate the possibility 
that other kernels might perform better we have tried several other common forms in the 
meantime. 
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The squared exponential (SE) kernel function (also known as the 'radial basis' kernel 
function) is given by 

ksE{r) = exp ("^^ (3) 

where I is the length scale. The length scale determines the rate at which the kernel function 
drops to zero away from the origin. This covariance function is infinitely differentiable and 
hence is very smooth. Because it is so smooth, it can sometimes be unrealistic for use in 
modeling real physical processes. 



The Matern class covariance function is given by 



where v and / are positive parameters and is a modified Bessel Function. As f — * oo 
this reduces to the SE above. The process becomes very non-smooth for v = ^ and for 
values of f > |, the function is as rough as noise. The Matern class covariance function 
is mean square differentiable k times if and only if f > A;. The Matern class of covariance 
functions can be used to model real physical processes and is more realistic than the above 
SE covariance function. 

The rational quadratic covariance function is given by 

As the value of the parameter a — oo this reduces to the SE function described earlier. 
Unlike the Matern class covariance function, this function is mean square differentiable for 
every value of a. 

The polynomial covariance function is given by 

k{x,x) = {al + x'^Ilpx'y (6) 

where Ep is a positive semidefinite matrix and p is a positive integer. If cTq = the kernel is 
homogeneous and linear, otherwise it is inhomogeneous. In principle this function may not 
be suitable for regression problems as the variance grows wi th | x \ for | x |> 1. Howeve r 



there are apphcations where it has turned out to be effective ( iRasmussen fc Williams! 120061 ). 
The neural network covariance function is given by 

kNNix.x) = —sin^^ I — : I (7) 

^ ' vr \^{l + 2xTT.x){l + 2x'TT.x') / ^ ' 



- 9 - 



This covariance function is named after neural networks because the function can be derived 



from the hmiting case of a model of a neural network (jNeallll996l ) 



In our calculations we chose S, which scales as the training-set data, to have the form 
J//^ where I is a d x d identity matrix. The hyperparameters / and A were selected by 
finding a (local) maximum t o the marginal likelihood using the routine minimize from 
Rasmussen fc Williamsl fl2nnd . pp. 112-116, 221). 



Two or more covariance functions can be combined to produce a new covariance func- 
tion. For example sums, products, convolutions, tensor products and other combinations of 
covariance functions can be used to form new covariance functions. Details are described in 



Rasmussen fc Williamsl (120061 ). 



For the calculations shown in the rest of the paper we utilized equation U\ the neural 
network kernel, since for our data it outperformed all other kernels. 



4.2. Low Rank Approximation Matrix Inversion Techniques 



As mentioned in Paper I (Section 4.4) to utilize GPR the inversion of the matrix M = 
(A^J + K) in equation [1] is required. This matrix turns out to be an n x n non-sparse matrix 
where n is the number of training-set galaxies. Paper I mentioned that matrix inversion 
requires 0{n^) floating point operations. Thus, to accommodate the matrix in memory and 
to keep the computation feasible, we kept n <1000 in Paper I. 

This was a severe shortcoming for GPs since they had 1-2 orders of magnitude less 
training samples to work with than all of the other methods described in Paper I. Nonetheless, 
GPs performed extremely well within this limitation. 

Since writing Paper I, we have developed a variety of rank-reduction methods to invert 
large non-s parse matrices. Th ese will make GPR much more competitive than that shown 
in Paper I. [Foster et al.l (120091 ) outline the rank-reduction methods utilized in detail, so we 
provide a brief summary of their advantages below. 

Note that the number of samples, n, is the same as that described above, while the 
rank, m < n, is the size of the rank- reduced matrix. We typically keep m <1500 to keep the 
numbers of operations to invert the matrices manageable in wall-clock time. Memory usage 
for the methods below is 0{nm). 

SR-N : the subset of regressors method. This method has been proposed and utilized in the 
past (IRasmussen fc Williamsl l2006l : IWhabal Il990l : IPoggio fc Girosol Il990l ) and requires nm^ 
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flops to invert. However, this method is known to have problems with numerical stability. 
That problem is addressed in the methods below. 

SR-Q: the subset of repre ssors using a QR factorization. The use of the QR factorization 
( iGolub fc Van LoanI Il996l . p. 239) is designed to reduce computer arithmetic errors in the 
SR-N method. This method requires 2nm^ flops to invert. Therefore, it is a little more 
expensive than SR-N. 

SR-V: the V method. Since this method in combination with pivoting (see below) is the one 
we utilize the most in later aspects of this paper we will go into a little more depth here. 
From Section m Equation ([1]) we recall that the size of [X"^ I + K)"^ is nxn and as mentioned 
above for large n it is not practical to calculate (A^J + K)~^ directly. To get around this we 
will approxirnate K with VV"^ where V is produced by partial Cholesky factorization (see 
Foster et al.l (120091 )). Let be the first m columns of K* and let Vu be the m x m matrix 
of the first m rows of V where m < n. Then let V* = KIV{{^ . In addition to replacing K 
with VV'^ we can also approximate K* with V*V'^ . With these substitutions one sees that 
K*{\^I + K)~^y from Equation ([T]) can be approximated by V*V'^ {X^I + VV^)''^y. It turns 
out that this can also be written as y* = V*{X'^I + V'^Vy^V'^y. The matrix (A^J + V'^V)~'^ 
is now m X m instead of n x n and for small enough m the equation can be solved quite 
quickly. The new flop count will be O(nm^). 

This method is intermediate in terms of growth of computer arithmetic errors between 
the normal equations and the SR- Q method, bu t in ge neral the accuracy i s close the SR-Q. 
This method was first discussed by lSeeger et al.l (120031 ) and IWhabal (Il990l . p. 136). 



SR-NP, SR-QP, SR-VP: the use of pivoting with rank-reduction methods. All of the previous 
methods use the first m columns of K, but one can select any subset of the columns to 
construct a low-rank approximation. Sel ecting these columri s is pa rt of the problem to be 
solved. Our approach is similar to that of iFine &: Scheinberg (120011 ). 



Pivoting is useful in forming a numerically stable low-rank approximation of a positive 
semi-definite matrix, and to do so it identifies the rows of the training data which limit 
the growth of computer arithmetic errors. A pivot of the matrix K, which is simply a 
permutation of K of the form PKP^ corresponds to the permutation PX of X. It is possible 
to move columns and rows of K so that the mxm leading principal submatrix of PKP^ has 
the condition number that is a function of n and m. Thus pivoting will tend to construct 
a low-rank approximation whose condition number is related to the condition number of 
the low-rank approximation produced by the singular-value decomposition. However, the 
growth of computer arithmetic errors in the algorithm depends on the condition number of 
the low-rank approximation. Since pivoting limits the condition number and the growth of 
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computer arithmetic errors depends on the condition number, pivoting will tend to improve 
the numerical stability of the algorithm. This can, in principle, reduce the effect of computer 
arithmetic errors. If computer arithmetic errors are larger than the other errors (such as 
measurement errors and modelling errors) in the prediction of the redshift, then an algorithm 
incorporating pivoting may potentially be more accurate than an algorithm without pivoting. 



Examples 2-4 in [Foster et al.l (120091 ) illustrate some of the dangers of not pivoting and 
how they are resolved with pivoting for small (artificial) problems. 

In the end adding pivoting increases SR-N to 2nm^ flops and SR-Q to 3nm^ while SR-V 
stays the same. 



5. Comparison: Picking the optimal Sample Size, Rank size, and Matrix 

Inversion Method 

Here we investigate Data Set 1 in detail in order to discern a variety of things including: 
is there an optimal sample size for a given survey; what is the best matrix inversion method; 
if using rank-reduction methods what is the optimal rank size? When discussing^onventional 
matrix inversion, we will be limited to a maximum of 20,000 training samples □. 

Figures [1] and [2] show the variation of RMSE and calculation time versus sample size. 
For the GP method (which is labeled GPR and is in yellow), this involved a full matrix 
inversion up to 20,000 training-set samples. The rest of the curves are from the other rank- 
reduction matrix inversion techniques and are labeled as described in the previous section. 
Several features are apparent: 

1. The SR-N method does not perform well in comparison to any of the other techniques. 
However, it does invert its matrices much faster than the standard matrix inversion 
technique. 

2. Except for the SR-N method, all of the other rank-reduction methods outperform the 
full matrix reduction in the range of 10,000-20,000 samples. 

3. The rank-reduction methods with pivoting slightly outperform the non-pivoting meth- 
ods in term of lower RMSE values. However, the pivoting methods take much more 
time to do the matrix inversions than the non-pivoting methods. 



^This is due to meniory(RAM) limitations. Our 64-bit compute platform is based around a 2 x 2.66 Gliz 
Dual-Core Intel Xeon with 16GB of 667Mhz DDR2 RAM 
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4. More training-set samples give lower RMSE values. By around 40,000 samples the 
curves start to level off regardless of the rank size. 

5. Larger rank sizes clearly give better performance in terms of lower RMSE for a given 
sample size. This is described in more detail below. 

Figure [3] shows the variation of RMSE with rank for several different sample sizes. The 
rank is plotted from 100 to 1000 in increments of 100, but we also add rank=1500 to see if 
there is a large change in calculated RMSE for a much larger value. Some important features 
to note here: 

1. As in Figure dl the RMSE decreases for larger sample sizes, but as was noted earlier, 
there is not a large difference between sample sizes of 40,000 and above. 

2. For the non-pivoting matrix inversion techniques (not including SR-N) SR-Q and SR- 
V the RMSE increases beyond rank=800. This suggests that there might be some 
instability associated with non-pivoting methods as rank size becomes large. For this 
reason, one should stick with the pivoting methods (SR-QP or SR-VP) if one wishes 
to use a rank of 800 or larger. 

3. On average it appears that SR-VP and SR-QP outperform the other rank reduc- 
tion methods. SR-VP also appears to outperform SR-QP, although the difference 
is marginal. 

4. SR-VP with rank=800 and sample size=40000 appear to be optimal choices for our 
data when looking at Figures [TH3] given the accuracy of the result. The timings are 
much longer for these pivoting methods as shown above, but they outperform all other 
methods. 

6. Results 

6.1. SDSS Main Galaxy and LRG Results 

The SDSS MGS (Data Set 1) & LRG (Data Set 2) will give us different results because 
the LRG sample has far fewer SED types than are found in the SDSS MGS while the LRG 
sample goes to fainter magnitudes and hence deeper redshifts (see Figures [H] and [9l). This will 
make the job of any regression algorithm quite different. This is evident in the two panels 
of Figure HI which show the variation of RMSE versus sample size for the two different data 
sets. A number of points need to be stressed: 
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1. Morphologicallnputs. The morphological information (p50, p90, ci, fd, qr) may add 
some information that the regression algorithm can utilize. This includes the Pet- 
rosian 50% radii (p50), the Petrosian 90% (p90), the inverse concentration index 
(ci=p50/p90), the FracDev (fd) and Stokes Q parameter (qr) all in the SDSS r band. 
More details on these parameters are discussed in Paper I. Data Set 1 (Figure Hl^a)) and 
the five SDSS filters u-g-r-i-z (not including morphology inputs) clearly outperform 
all of the subsets of u-g-r-i-z {g-r-i, u-g-r-i, and g-r-i-z) and the addition of morpho- 
logical inputs. In Data Set 2 (Figure |l](b)) the morphological information appears to 
add noise for the most part making the fits worse than by using only combinations of 
the five SDSS u-g-r-i-z bandpass filters. 

2. FewerSEDs. As mentioned in the previous section, by the time sample sizes of ~ 
40,000 are reached in the SDSS-MGS of Data Set 1 (Figure H^a)) the RMSE begins to 
level off. In the SDSS-LRG of Data Set 2 (Figure Hl^b)) however this is already occur- 
ring for most of the inputs in the 10,000-20,000 range. This is clearly the advantage of 
having less SEDs to worry about in the SDSS-LRG sample versus the SDSS-MGS. In 
fact for Data Set 2 (SDSS-LRG) it is clear that only four of the five SDSS bandpasses 
are sufficient for the optimal fit (g-r-i-z). The SDSS u bandpass is clearly superfluous 
in the SDSS-LRG data set when using GP fitting routines. 

3. Errors. 90% confidence levels derived from the bootstrap resampling are roughly at 
the level of the variation in each of the inputs used as a function of sample size. It is 
clear that adding morphological information requires larger error estimates for these 
datasets. 



6.2. Cross-Matching GALEX and SDSS Results 

Figure [5] shows results from a cross-match of the SDSS and GALEX catalogs, which 
are listed as Data Sets 3 and 4 in Tables [1] and [2j Figure [7| shows the SDSS and SDSS + 
GALEX results for Data Sets 1-4, but without any SDSS morphological inputs included. 
This is to better quantify the differences between the SDSS and SDSS + GALEX GP fits. 
The following should be noted: 

1. Comparing Figure ID^a) to Figure [S](a) one sees that those inputs that include SDSS 
morphological information are slightly improved when GALEX filters are included. 
The error bars on those with morphological inputs (errors not shown here) are also 
smaller in Figure [5]^a) versus Figure Hl^a). This would imply that the addition of 
GALEX filters helps make better use of the morphological inputs. 
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2. Figure[7|^a) is made up of Figures |ll^a),[5](a), and[6l^a) without the SDSS morphological 
information included. One notices that Data Set 3 (SDSS-MGS + GALEX) in Figure 
Wi^) has higher RMSE values for the purely SDSS bandpasses {g-r-i, u-g-r-i, g-r-i-z, 
u-g-r-i-z) than Data Set 1 (SDSS-MGS only). Here the max size of the training data 
sets is different by a factor of 2.7 (80,000 versus 30,000) hence the difference may be 
attributed to a smaller data set size, although that is unlikely given how we subsample 
the data in Data Set 1. However, if one examines Figure [8] one sees clear differences 
and similarities in the magnitude and redshift distributions of these two catalogs. In 
particular the r-band magnitude distribution is quite distinct, the z-band less so. This 
seems to have made it harder for the GPs to obtain a good fit for the MGS galaxies. 
Within Data Set 3 of Figure [Tl^a) the GALEX bandpasses help with two of the SDSS 
only input options {g-r-i and g-r-i-z) compared to Data Set 1. However, the two 
GALEX bandpasses do not help with the best inputs from Data Set 1 {u-g-r-i and 
u-g-r-i-z). Hence for the MGS galaxies there appears no need to utilize the GALEX 
magnitudes to improve photo-z estimation over that already obtained from SDSS only 
magnitudes. The same applies to the the SDSS morphological information, which adds 
very little of substance. For example, compare u-g-r-i-z in Data Set 1 (Figure Hl^a)) 
versus nuv-fuv-w-^f-r-i-z-pSO-pQO-fd-qr in Data Set 3 (Figure [5](a)). 

3. Comparing Figures ll](b) and [5](b), one sees that the LRG + GALEX cross- match 
catalog has lower RMSE values than the LRG only catalog regardless of the inputs 
used. Hence one would be led to believe that one should always use GALEX magnitudes 
where available for LRG galaxies to improve photo-^ estimation. However, there are 
two other things to take note of. First, one again sees that the max training data set 
size is a factor of 20 smaller (80,000 versus 4000) between Data Sets 4 and 2, although 
Data Set 2 does take a subsample at the level of Data Set 4. Therefore, sample size 
does not appear to be the issue here. Looking at Figure [9] it is clear that there are few 
similarities in the magnitude or redshift distributions for these two data sets. Clearly 
the GP algorithm is fitting a completely different set of data points and it finds Data 
Set 4 much easier than Data Set 2. 

4. Looking at Figure [7](b) (made up of Figures |l](b), [5t^b) and [6](b) without the SDSS 
morphological inputs included) the addition of the GALEX nuv-fuv filters within Data 
Set 4 seem to assist in photo-z estimation when using SDSS filters g-r-i and u-g-r-i., 
but has a little effect when added to the already superior g-r-i-z and u-g-r-i-z. 

As noted above, the RMSE differences between Figures |1]( a) and[5](a) suggest that the 
underlying distribution of SDSS magnitudes and redshifts of Data Set 1 versus 3 are different 
as seen in Figure [HI The data set has shrunk in size between Data Sets 1 and 3, while the 
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redshift distribution appears the same. However, the colors of the galaxies have changed 
enough that the GPs find it harder with the reduced sample size to obtain a good fit. 

The explanation for the improvement seen between Figures ll](b) and E^b) (Data Sets 
2 and 4) is perhaps simpler. Figure [9] shows the u,r,z and redshift distributions for these 
two data sets. Clearly, the centroid, spread, and shape of the distributions of the u,r,z and 
redshift distributions are signficantly different. The LRG + GALEX redshift distribution in 
particular is strongly truncated beyond a redshift of about 0.2 while the magnitude distri- 
butions tend to be more Gaussian in shape. Certainly it is easier for GPs to come up with 
better fits for lower-redshift distributions. 

The marked differences between the SDSS MGS and LRG results are because of the 
different galaxy SEDs that exist in each catalog. These differences also exist because the 



LRG samples go fainter than the MGS samples (see lEisenstein et al.ll200ll ) and they have a 
different redshift and galaxy magnitude distribution (see Figures [8] and [9]) . The magnitude 
and redshift differences between the pure LRG and LRG+GALEX catalogs are much larger 
than they are between the corresponding MGS and MGS+GALEX catalogs. Clearly the 
additional GALEX inputs affect the SDSS MGS only (u-g-r-i-z) results negatively, while the 
GALEX inputs affect on the LRG sample is ambiguous at best. These differences suggest 
that one must be very careful in interpreting the improvement in RMSE results associated 
with any SDSS + GALEX cross-match catalogs. 



6.3. Cross-Matching 2MASS and SDSS Results 

Figure E] demonstrates our GPR results from a cross-match catalog containing the 
2MASS extended source catalog with the SDSS MGS (Data Set 5) and the SDSS LRG 
sample (Data Set 6). When Figure [6] is compared with Figure HI the results in Figure [6] are 
significantly better for both cases. While it might be tempting to attribute this improvement 
to the inclusion of additional bandpasses in the analysis in Figure O it is important to take 
note of a variety of other important differences between the RMSE estimates in these two 
figures. 

1. For the SDSS only bandpasses (u-g-r-i-z) the RMSE drops significantly between Data 
Sets 1-5 (Figure 111(a) -El^a)) and Data Sets 2-6 (Figures |l](b) -E](b)); see Figure [7] for 
another viewpoint. This drop is because the 2MASS galaxies tend to be brighter and 
at lower redshift making the cross-match catalog between the 2MASS and SDSS also 
brigher and lower redshift than the SDSS only catalog especially for the case of the 
LRG cross- match samples (see Figures [TO] and [TTj) . 
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2. Figure ini^b) (Data Set 6) has lower RMSE values compared to Figure |l](b) (Data Set 
2) regardless of input. It also appears to converge to a best fit RMSE very quickly in 
comparison to Data Set 5 (Figure El^a)). 

3. In Figure [Tl^a) (focusing on Data Sets 1 and 5) it is clear that adding the 2MASS fluxes 
improves the RMSE fit regardless of which SDSS filters are combined with the 2MASS 
j-h-k bandpasses. 

4. In Figure [6](b) (Data Set 6) adding the 2MASS fiuxes can improve the RMSE fit, but 
the conditions under which this improvement occurs are significantly different from 
those in Figure [6]^a) (Data Set 5). Upon close inspection it can be seen that equivalent 
best results are obtained as the training sample reaches ~ 20,000 using g-r-i-z-j-h-k 
(dashed green). This shows that for Data Set 6, the u band adds little to the LRG 
sample. This is consistent with the behavior observed in Figure |l](b) (Data Set 2). 



6.4. Systematics 



In Figures fT2] and fT3l we plot the redshifts and residuals, respectively, for those data sets 
that yield the lowest RMSE. The actual RMSE is also indicated in each plot. There appears 
to be a systematic shift above the regression line for redshifts less than 0.1 and below the 
regression line between 0.1<z<0.2 for Data Sets 1, 3 and 5 . This effect has been seen or 



Ball et al. 


2008; 


Wane et al. 


2009) 



At low redshifts (z<0.1) the bias in the regression line seen in Figure [T2] (Data Set 1) is 
probably caused by the lack of deep w-band data (see Figures [H] and [9]) . When supplemented 
by the GALEX data the bias looks to be slightly improved in Data Set 3 (see Figures [T^ and 
131) . The bias seen in between redshifts of 0.1<z<0.2 for the SDSS-MGS data sets (Data 
Sets 1,3,5) is probably due to degeneracies in the spectral features of those galaxies. This 
bias appears to be less with the addition of GALEX or 2MASS magnitudes, but it is still 
present nonetheless. 



6.5. Comparison with other work 

In Paper I, we attempted to make comparisons between our more primitive version of 
GPs (limited to 1000 training samples) and several other well-known methods that we ran 
ourselves (see Paper I, Tables 4-6) which included linear and quadratic regression, the neural 
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network ANNz package by lCoUister fc Lahavl (120041 ). and our own neural network type code 
called Ensemble Modeling (E-Model). In Table |3l we give the reader some appreciation of the 
abilities of our updated GP method. We compare our new GP method with a representative 
sample of recent work on two easily comparable data sets: Data Set 1 using u-g-r-i-z inputs 
and Data Set 2 using only u-g-r-i-z inputs. 



7. Conclusion 

We have demonstrated that with new non-sparse matrix inversion techniques and a 
better choice of kernel (or transfer function if you prefer) that GPR is a competitive way to 
obtain accurate photometric redshifts for low-redshift surveys such as the SDSS. However, 
several caveats must be noted regarding the estimation of photometric redshifts from com- 
bined catalogs of the SDSS and 2MASS as well as the SDSS and GALEX as discussed in 
Section O 

The SDSS + 2MASS and SDSS + GALEX cross-match results are astoundingly good 
in some cases, but this occurs even when the only bandpasses used are the u-g-r-i-z of the 
SDSS cross-matched set. This is clearly a case where we are sampling a smaller range of 
redshifts and magnitudes, which makes the regression job easier regardless of the algorithm. 
This shows that one has to be careful when quoting "better" results from a cross-match of 
any catalog. 

We also demonstrate that the addition of many SDSS morphological parameters does 
not systematically improve our regression results. For a low-redshift survey like the SDSS, 
it makes intuitive sense that the Petrosian radii would help given the angular-diameter- 
dista nce relation, but that does not appear to be the case here unlike that of other studies 
(e.g., 



Wadadekad 120051 ). 



The papers associated with this project and the code used to generate the results from 
this paper are available on the NASA Ames Dashlink Web site https:/ /dashlink.arc. nasa.gov/algorithm/stal 

M.J.W thanks Jim Gray, Ani Thakar, Maria SanSebastien, and Alex Szalay for their 
help in cross-matching the catalogs used herein. Thanks goes to the Astronomy Department 
at Uppsala University in Sweden for their generous hospitality while part of this work was 
completed. M.J.W. acknowledges funding received from the NASA Applied Information 
Systems Research Program. A.N.S. thanks the NASA Aviation Safety Integrated Vehicle 
Health Management project for support in developing the GP-V method. The authors 
would like to acknowledge support for this project from the Woodward Fund, Department of 
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Table 3. Photometric Redshift estimator comparisons for u-g-r-i-z inputs 



Method Name 


TTIIS 


Data Set'' 


Source 


cww 


0.0666 


MGS 


SDSS-EDR 


Csabai et al. (2003') 


Bruzual-Charlot 


0.0552 


MGS 


SDSS-EDR 


Csabai et al. ('20031 


ClassX 


0.0340 


MGS 


SDSS-DR2 


Suchkov et al. f2005) 


Polynomial 


0.0318 


MGS 


SDSS-EDR 


Csabai et al. (2003) 


Kd-tree 


0.0254 


MGS 


SDSS-EDR 


Csabai et al. (2003) 


Support vector machine 


0.0270 


MGS 


SDSS-DR2 


Wadadekar (2005) 


Artificial neural network 


0.0229 


MGS 


SDSS-DRl 


CoUister & Lahav (2004) 


Nearest neighbor 


0.0207 


MGS 


SDSS-DR5 


Ball et al. (20081 




0.0198 


MGS 


SDSS-DR5 


Ball ct al. (2008) 


Hybrid Bayesian 


0.0275 


MGS 


SDSS-DR5 


Wray & Gunn (2008) 


Linear regression 


0.0283 0.0282 0.0284 


MGS 


SDSS-DR3 


Way & Srivastava (20061 


Quadratic regression 


0.0255 0.0255 0.0255 


MGS 


SDSS-DR3 


Way & Srivastava (20061 


ANNz'^ 


0.0206 0.0205 0.0208 


MGS 


SDSS-DR3 


Way & Srivastava (20061 


Ensemble model 


0.0201 0.0198 0.0205 


MGS 


SDSS-DR3 


Way & Srivastava (20061 


Gaussian process lOOC^ 


0.0227 0.0225 0.0230 


MGS 


SDSS-DR3 


Way & Srivastava (20061 


Gaussian process" 


0.0201 0.0200 0.0201 


MGS 


SDSS-DR3 


This work: Data Set 1 


Nearest neighbor 


0.0243 


LRG SDSS-DR5 


Ball et al. (20081 




0.0223 


LRG SDSS-DR5 


Ball et al. (20081 


Hybrid 


0.0300 


LRG SDSS-DR3 


Padmanabhan et al. (20051 


Linear regression' 


0.0289 0.0289 0.0289 


LRG SDSS-DR5 


This work: Data Set 2 


Quadratic regression' 


0.0240 0.0240 0.0240 


LRG SDSS-DR5 


This work: Data Set 2 


ANNz<= 


0.0207 0.0205 0.0210 


LRG SDSS-DR5 


This work: Data Set 2 


Ensemble Model' 


0.0221 0.0220 0.0221 


LRG SDSS-DR5 


This work: Data Set 2 


Gaussian Process'^ 


0.0220 0.0217 0.0240 


LRG SDSS-DR5 


This work: Data Set 2 



"The (Jrms cited here are for rough comparison only. No error bounds are included for the cited 
publications since many do not give error bounds or they are not handled in a consistent fashion across 
publications. For this paper's results, we quote the bootstrapped 50%, 10%, and 90% confidence levels 
as in Paper L 

''MGS: Mai n Galaxy sample LR G = Luminous Red Galaxy sample, SD SS-EDR = SDSS Earl y 
Data Release dStoughton et al.ll20ojl . SDSS-DRl = SPSS D ata Release One llAbazaiian et al-lboosll . 
SDSS-DR2 = SPSS Da ta Release Two jAbazaiian et al.ll2004h . S pSS-DR3 = SDSS Pata Release Three 
l lAbazaiian et al.ll2005ll . SPSS-PR5 = SPSS Pata Release Five dAdelman-McCarthv et al.ll2007ll . 

'^Uses the ANNz code of llCollister fc LahavllSooil . 

"^GP algorithm limited to 1000 training samples. 

°GP algorithm SR-VP with 80,000 training samples and rank=800. 

See Paper I llWay fc Sriyastavalliooi) for details on these algorithms. 
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Data Set 1 : u-g-r-i-z, rank=200 



Data Set 1 : u-g-r-i-z, rank=400 
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Fig. 1. — From Data Set 1 (see Table [T]). Error bars are not plotted for reasons of clarity; 
however, they are of the same order as the scatter in the lines. 
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Data Set 1 : u-g-r-i-z, rank=200 
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Fig. 2. — From Data Set 1 (see Table [1]), but unlike in Figure 1 we show that the matrix 
inversion times are linear out to the full size (180,000 galaxies) of the data set. 
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Data Set 1 : u-g-r-i-z, sample=20000 



Data Set 1 : u-g-r-i-z, sample=40000 
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Data Set 1 : u-g-r-i-z, samp[e=60000 



Data Set 1 : u-g-r-i-z, samp[e=80000 
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Fig. 3. — From Data Set 1 (see Table [T]) error bars are not plotted for reasons of clarity. 
They are of the same order as the scatter in the lines. 
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Fig. 4. Prom Data Sets 1 and 2 (see Table [T]l. We utilize the rank-reduction method termed SR-VP with a rank size of 

800. The train ing sets (n in the plot, fol lowing our earlier notation) range in size from 1000 to 80,000 in 1000 increments with 
10 bootstraps llEfron &: Tib shirani"l993h per run. The testing sample size (n*) was always 20,229. The mean value of the 10 
bootstraps is plotted. 90% confidence levels from the bootstrap resampling are of order the vertical line variation. Clearly, the 
errors are much larger for those which include the morphological parameters. 




Fig. 5. Prom Data Sets 3 and 4 (see Table [TJ. We utilize the rank-reduction method termed SR-VP with a rank size 

of 800. On the left in plot (a), we use training sets (n in the plot, following our earlier notation) ranging in size from 1000 to 
30,000 in 1000 increments with 10 bootstraps per run. The testing sample size (n*) is 3374. The mean value of 10 bootstraps 
resampling runs is plotted. 90% confidence levels from the bootstrap resampling are of order the vertical line variation. On the 
right, wc use similar notation, but we have smaller training (1000—4000 in increments of 1000) and testing (454) sets. 
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Fig. 6. — From Data Sets 5 and 6 (see Table [T]). We utilize the rank-reduction method 
termed SR-VP with a rank size of 800. For Data Set 5 the training sets (denoted as n) range 
in size from 1000 to 80,000 in 1000 increments with 10 bootstraps per run and a testing-set 
(n*) size of 15,050. On the right. Data Set 6 training sets range from 1000 to 40,000 in 
increments of 1000 with 10 bootstraps per run and a testing-set size of 4420. Bootstrap 90% 
confidence levels are again of order the vertical line variation. 




Fig. 7. — From Data Sets 1-6 (see Tabled]). The SDSS u-g-r-i-z filter combinations alone 
along with those of GALEX nuv, fuv filters, and 2MASS j,h,k. This demonstrates how the 
addition of the GALEX and 2MASS filters influence the SDSS only magnitude fits via the 
GP SR-VP method. 
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Fig. 8. — Overlapping histograms for Data Sets 1 and 3 (see Tabled]) from three of the five 
SDSS magnitudes {u,r,z). Data Set 1 is in blue, and Data Set 2 in magenta. Of course, the 
SDSS+GALEX cross-match catalogs (Data Set 3) are smaller, so the SDSS only data (Data 
Set 1) was randomly resampled to be the same size as the cross-match catalog so that trends 
in the plots are directly comparable. 
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Fig. 9. — Overlapping histograms for Data Sets 2 and 4 (see Table H]) from three of the five 
SDSS magnitudes {u,r,z). Data Set 2 is in blue and Data Set 4 in magenta. Of course, the 
SDSS+GALEX cross-match catalogs (Data Set 4) are smaller, so the SDSS only data (Data 
Set 2) was randomly resampled to be the same size as the cross-match catalog so that trends 
in the plots are directly comparable. 
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Fig. 10. — Overlapping histograms for Data Sets 1 and 5 (see Table [T]) from three of the 
five SDSS magnitudes {u,r,z). Data Set 1 is in blue and Data Set 5 in red. Of course, the 
SDSS+2MASS cross-match catalogs (Data Set 5) are smaller, so the SDSS only data (Data 
Set 1) was randomly resampled to be the same size as the cross- match catalog so that trends 
in the plots are directly comparable. 
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Fig. 11. — Same as Figure fTOl except we use Data Sets 2 (blue) and 6 (red) 
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Fig. 12. — Spectroscopic redshift plotted again predicted photometric redshift for the best 
performing input from each of the data sets in Table [H 
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Fig. 13. — Residuals as a function of spectroscopic redshift for the best performing input 
from each of the Data Sets in Table [H 



